High tide in Venice

You are asked to analyze a dataset concerning the high tides in Venice. These data are used by the “Centro Previsioni e Segnalazioni Maree” to produce forecasts of the high tide. The current prediction is reported below (in Italian):

General documentation

The analysis of high tides in Venice has a very long history, and this complex procedure is handled by the Centro Previsioni e Segnalazioni Maree, located in Palazzo Cavalli. Their website contains interesting material you are encouraged to read.

If you are unfamiliar with this phenomenon, you may want to read this short booklet (ITA and ENG) to get an overview. This additional booklet (ITA) is also quite informative.

Dataset description

The data can be downloaded here (venice_meteo.zip). Inside the zip folder, you will find several datasets, each corresponding to a different meteorological station, which are:

Each station tracks different kinds of information, depending on the sensors that have been installed. A detailed documentation (ITA) of these datasets is made available by the Centro Maree for each of the above locations. Please refer to the above link for the variable description.

Map of the stations

There are also three additional files:

  • Meteorological data (Dati_Meteo.csv)
  • Astronomical tide (astronomical_tide_2022.json and astronomical_tide_2023.json)

There is a big overlap between the station data and the meteorological data (ITA and ENG). The astronomical data are obtained from a mathematical model, as described here (ITA and ENG).

Homework rules

  • You will work in groups.
  • You need to submit the following files:
    • A Python notebook (tidy_homework_group_name.ipynb) that uses as input the (venice_meteo.zip) files and whose output is a clean dataset;
    • A comma-separated file which contains the tidy dataset (tidy_dataset_group_name.csv) produced by the above Python notebook.

You are asked to create a single dataset in which the rows are hourly observations ranging from November 15th, 2022, to January 31st, 2023, combining the information that has been made available. At the very least, you should:

  • Understand how to import .json files using pandas;
  • Merge all the above datasets;
  • Delete the redundant variables and translate their names into English;
  • Exclude from the analysis observations not belonging to 2022-11-15 to 2023-01-31;
  • Aggregate some variables, so that observations are recorded hourly rather than every 5 minutes. You will need to think about an appropriate aggregation method for each variable (sum, mean, max, min, and median are possible candidates);
  • Export the tidy dataset into a file tidy_dataset_group_name.csv.

Any additional fixes and improvements you wish to perform are welcomed, as long as they increase the “tidiness” of the final dataset tidy_dataset_group_name.csv.